Trend Detection in Folksonomies

نویسندگان

  • Andreas Hotho
  • Robert Jäschke
  • Christoph Schmitz
  • Gerd Stumme
چکیده

As the number of resources on the web exceeds by far the number of documents one can track, it becomes increasingly difficult to remain up to date on ones own areas of interest. The problem becomes more severe with the increasing fraction of multimedia data, from which it is difficult to extract some conceptual description of their contents. One way to overcome this problem are social bookmark tools, which are rapidly emerging on the web. In such systems, users are setting up lightweight conceptual structures called folksonomies, and overcome thus the knowledge acquisition bottleneck. As more and more people participate in the effort, the use of a common vocabulary becomes more and more stable. We present an approach for discovering topic-specific trends within folksonomies. It is based on a differential adaptation of the PageRank algorithm to the triadic hypergraph structure of a folksonomy. The approach allows for any kind of data, as it does not rely on the internal structure of the documents. In particular, this allows to consider different data types in the same analysis step. We run experiments on a large-scale real-world snapshot of a social bookmarking system. 1 Social Resource Sharing and Folksonomies With the growth of the web, both the number and the heterogeneity of types of available resources have increased dramatically. The management of such a collection of resources includes many subtasks like search, retrieval, clustering, reasoning, and knowledge discovery. For all these tasks, some sort of conceptual description of the documents is essential. While there are many approaches that have been applied successfully for years for extracting such descriptions from text documents — ranging from the bag-ofwords model for information retrieval to ontology learning — there are fewer solutions for images, videos, audio tracks and music data up to now. The way from the features of the different resources to a conceptual description is generally far more difficult for multimedia data. Furthermore, these techniques have to be developed separately for each kind of data. For applications like the detection of trends from a collection of resources consisting of several types of (multimedia) data — which is the topic of this paper — first a common format for the representation of the conceptual model plus extraction techniques for each of the data types would have to be defined. Complementing the extraction of conceptual descriptions from the documents themselves, social resource sharing tools are currently emerging on the web, as a part of what is called “social software” or “Web 2.0”. In these user-centric publishing and knowledge management platforms, a conceptual description is provided to each document by the user in the form of a collection of ‘tags’, i. e., of arbitrary, user-defined catchwords. As this description is independent of the format of the resource, the social tagging approach provides a unified model for all kinds of resources, including in particular multimedia formats. Social resource sharing tools, such as Flickr3 or del.icio.us4 (see Fig. 1), have acquired large numbers of users within less than two years. The social photo gallery Flickr, for instance, is estimated to have over a million users. The reason for the immediate success of these systems is the fact that no specific skills are needed for participating, and that these tools yield immediate benefit for each individual user (e.g. organizing ones bookmarks in a browser-independent, persistent fashion) without too much overhead. Large numbers of users have created huge amounts of information within a very short period of time. The frequent use of these systems shows clearly that weband folksonomy-based approaches are able to overcome the knowledge acquisition bottleneck, which was a serious handicap for many knowledge-based systems in the past. Social resource sharing systems are web-based systems that allow users to upload their resources, and to label them. All these systems share the same core functionality. Once a user is logged in, he can add a resource to the system, and assign arbitrary labels, so-called tags, to it. Resources can be almost anything. In systems such as our BibSonomy,5 for instance, resources are bookmarks and bibliographic references, in Flickr they are photos, in last.fm6 music files, in YouTube7 videos, and in 43Things8 even goals in private life. The collection of all assignments of a user is called his personomy, the collection of all personomies is called folksonomy. The user can also explore the personomies of the other users in all dimensions: for a given user he can see the resources that user had uploaded, together with the tags he had assigned to them; when clicking on a resource he sees which other users have uploaded this resource and how they tagged it; and when clicking on a tag he sees who assigned it to which resources (see Fig. 1). The word ‘folksonomy’ is a blend of the words ‘taxonomy’ and ‘folk’, and stands for conceptual structures created by the people. Folksonomies are thus a bottom-up complement to more formalized Semantic Web technologies, as they rely on emergent semantics [17, 18] which result from the converging use of the same vocabulary. In this paper, we will analyze this emergence of common semantics by exploring trends in the folksonomy. Since the structure of a folksonomy is symmetric with respect to the dimensions ‘user’, ‘tag’, and ‘resource’, we can apply the same approach to study upcoming users, upcoming tags, and upcoming resources. We present a technique for analyzing the evolution of topic-specific trends. Our approach is based on our FolkRank algorithm [10], a differential adaptation of the PageRank algorithm [3] to the tri-partite hypergraph structure of a folksonomy. Compared to pure co-occurrence counting, FolkRank takes also into account elements that are related to the focus of interest with respect to the underlying graph/folksonomy. In particular, FolkRank ranks synonyms higher, which usually do not occur in the same bookmark posting together. 3 http://www.flickr.com/ 4 http://del.icio.us 5 http://www.bibsonomy.org 6 http://www.last.fm 7 http://www.youtube.com/ 8 http://www.43things.com/ Fig. 1. Del.icio.us, a popular social bookmarking system. With FolkRank, we compute topic-specific rankings on users, tags, and resources. In a second step, we can then compare these rankings for snapshots of the system at different points in time. We can discover both the absolute rankings (who is in the Top Ten?) and winners and losers (who rose/fell most?). The contributions of this work are: Ranking in folksonomies. We describe a general ranking scheme for folksonomy data. The scheme allows in particular for topic-specific ranking. Trend detection. We introduce a trend detection measure which allows to determine which tags, users, or resources have been gaining or losing in popularity in a given time interval. Again, this measure allows to focus on specific topics. Application to arbitrary folksonomy data. As the ranking is solely based on the graph structure of the folksonomy – which is resource-independent – we can also apply it to any kind of resources, including in particular multimedia objects, but also office documents which typically do not have a hyperlink structure per se. It can even be applied to an arbitrary mixture of these content types. Actually, the content of the tagged resources will not have to be accessible in order to manage them in a folksonomy system. Evaluation. We have applied our method to a large-scale dataset from an actual folksonomy system. The paper is organized as follows. In the next section, we describe our ranking and trend detection approach. In Section 3, we apply the approach to a large-scale dataset, a one-year snapshot of the del.icio.us system. Section 4 discusses related work, and Section 5 concludes with an outlook on future topics in this field. 2 Trend Detection in Folksonomies For discovering trends in a social resource sharing system, we will need snapshots of its folksonomy at different points of time. For each snapshot, we will need a ranking, such that we can compare the rankings of consecutive snapshots. As we also want to discover topic-specific trends, we will additionally need a ranking method that allows to focus on the specific topic. We will make use of our search and ranking algorithm FolkRank [10] which we summarize below.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Folksonomies for Trend Detection in Task-Oriented Dialogues

Dialogues are created by the interaction between people, who speak different kinds of topics using natural language. Task-oriented dialogue aims the solution of a given task in a given domain. Folksonomies are knowledge structures composed of users, tags and resources. Folksonomies emerge from the tagging process in collaborative tagging systems. Dialogues and folksonomies have in common their ...

متن کامل

Detection of Overlapping Communities in Social Tagging Systems

Some of the most popular sites in the Web today are social tagging systems or folksonomies (e.g. Delicious, Flickr, LastFm etc.) where users share resources and collaboratively annotate resources with tags which help in the search, personalized recommendation and organization of the resources. Folksonomies are modelled as tripartite (user-resource-tag) hypergraphs in order to study their networ...

متن کامل

Modeling User Expertise in Folksonomies by Fusing Multi-type Features

The folksonomy refers to online collaborative tagging system which offers a new open platform for content annotation with uncontrolled vocabulary. As folksonomies are gaining in popularity, the expert search and spammer detection in folksonomies attract more and more attention. However, most of previous work are limited on some folksonomy features. In this paper, we introduce a comprehensive us...

متن کامل

رده بندیهای مردمی در مقابل واژگان مهار شده: رویکردهای نظری

The purpose is review the literature to identify the epistemological and theoretical approach to the folksonomies and compare them with the theoretical foundations of controlled vocabularies. This paper is a library research. A review of the literature, identify and review theoretical approaches related to the folksonomies includes Critical social theory, Social constructivism, Relativism, Derr...

متن کامل

Query Expansion in Folksonomies

People share resources in folksonomies and add tags to these resources. There are often only a few tags associated with each resource, which makes the data available in folksonomies extremely sparse. Spareness in folksonomies makes searching resources difficult. Many relevant resources against a query might not be retrieved if they are not associated with the queried terms. One possible way to ...

متن کامل

Harnessing Folksonomies for Search

This paper analyses folksonomies, an emergent web 2.0 technology. Folksonomies are found to be primarily a social dynamic phenomenon, and several key tensions are hypothesised that keep the folksonomy community vibrant. Strengths and weaknesses of folksonomies are analyzed w.r.t applicability to browsing and search, and suggestions are given on how to alleviate search problems by bringing in ad...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006